Loading...
 

SNP Databases

A place to capture information about various SNP databases that exist. Some very leading edge as part of Sequencing test analysis. Others part of the slower, more formal refereed paper process. Eventually, all should be feeding into the NIH rsID master database. Also included are sites / tools to convert and lookup SNPs. The first table is for yDNA chromosome only databases since that represents the majority. The second table is for all / generic SNP databases across all DNA.

Most file formats have the chromosome and location within the chromosome defined. Some then have either an "rsID" identifier (from dbSNP listed below) or, in the case of yDNA, and SNP Name. A few cases, the files have some other, internal company identifier but the chromosome and location is usable to get an equivalent "rsID". Often, they were defined before submission to the database and the formats never updated. In the NGG file format for mtDNA and yDNA, they specifically only used the SNP Name and no other identification (no "rsID" nor location; the chromosome / DNA is known by context of the enclosing file).

Y Chromosome Databases
DB Site SNP Pre-fix Notes
YBrowse DBISOGG by Thomas Khran (DNA Fingerprint / FTDNA, YSEQ); contains SNP Names and locations
SNP IndexISOGG for SNP's placed in their tree
BY SNP Index @ ISOGGFTDNABYComplete list including early named FTDNA SNP's before full confirmation (mostly since BY500 test introduced) (227,533 entries as of Dec 2018)
FT SNP Index @ ISOGGFTDNAFTNew list of additional SNP's (mostly since BY700 test introduced). Oddly claims replacing BY list but is much, much smaller list. (92,626 entries as of Jul 2019)
FGC SNP Index @ ISOGGFGCFGCDitto for FGC
SNP-ListyFullYVery centric on their nomenclature; difficult to find comparable names already identified in other DB's
MutationsyTree-List is too short to be a complete list of variances they cover
VariantsHaplogroup-RHR


Generic SNP Databases
DBSiteSNP Pre-fixNotes
dbSNPNIHrsHome of the rsID's (in both Build37 and Build38), SNP names and genes?)
SNPedia Same people behind Prometheus site (see also MIT Tech Review article)
SNP-Nexus
Ensemble and BioMartEMBI-EBI More than just the Human Genome
European Variation Archive More than just the Human Genome
OK, not really a SNP database or tool. But MapS Converter allows base-pair locus conversions between Build36 and Build37, and length conversion between base-pairs start/stop and centiMorgans.

Microarray Result Databases
  • See Microarray Databases on Wikipedia for a list of curated entry databases.
  • openSNP for a user-submitted, unreviewed, public submission, public database

Sequencing Result Databases

To get some sense of content, here are some figures for the some of these databases:
yBrowse.org
snps_hg38.csv file dated 2 Aug 2020
# Entries: 1,263,058 (with 7,433 of then marked InDels)
# FT Entries: 362,547 (whereas the corresponding FTDNA FT Spreadsheet has 362,012)
# BY Entries: 264,697 (whereas the corresponding FTDNA BY Spreadsheet has 230,312)
# Y Entries: 195,034 (whereas yFull.com/snp-list as of 2 Aug 2020 has 205,115)
# YP Entries: 7,113 (whereas yFull.com/yp/snp-list as of 2 Aug 2020 has 6,442)
# FGC Entries: 125,598 (whereas the coresponding FGC spreadsheet has 65,536)
It is not clear how aliases for the same SNP are being handled. That is, same location with multiple names. Also not clear how many entries in yBrowse are STRs — which they minimally capture. Other sources indicate 15,000+ total STRs with NGS analysis groups FTDNA and yFull both having under 1,000 not-yet-defined in their analysis results.

dbSNP
Reference GRCh38 release from file GCF_000001405.38.gz dated 14 May 2020
SN# Entries# InDels
NC_000001.11 (chr1) 54,954,827 3,490,824 InDels
NC_000002.12 (chr2) 58,785,283 3,732,571 InDels
NC_000003.12 ... 48,086,717 3,022,655 InDels
NC_000004.12 46,216,186 2,947,022 InDels
NC_000005.10 43,329,407 2,739,837 InDels
NC_000006.12 40,561,773 2,651,154 InDels
NC_000007.14 38,926,341 2,535,153 InDels
NC_000008.11 36,812,460 2,210,783 InDels
NC_000009.12 30,541,428 1,865,518 InDels
NC_000010.11 32,443,993 2,076,655 InDels
NC_000011.10 33,240,682 2,047,744 InDels
NC_000012.12 32,136,494 2,100,312 InDels
NC_000013.11 23,659,495 1,538,386 InDels
NC_000014.9 21,616,309 1,403,950 InDels
NC_000015.10 20,228,514 1,322,963 InDels
NC_000016.10 22,226,258 1,337,067 InDels
NC_000017.1119,759,284 1,345,501 InDels
NC_000018.10 18,734,309 1,188,677 InDels
NC_000019.10 15,192,343 1,070,367 InDels
NC_000020.11 15,416,556 976,361 InDels
NC_000021.9 9,241,876 614,928 InDels
NC_000022.11 (chr22) 9,625,604 639,800 InDels
NC_000023.11 (X) 27,812,735 1,713,673 InDels
NC_000024.10 (Y) 1,665,288 107,759 InDels
TOTAL 701,214,162 144,679,660
1 It is reported that the actual, curated entries for dbSNP is around 100 million.

See Also

Microarray File Formats, Sequencing File Formats,

External References